Locked constraint satisfaction problems 
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We introduce and study the random 'locked' constraint satisfaction problems. When increasing 
the density of constraints, they display a broad 'clustered' phase in which the space of solutions is 
divided into many isolated points. While the phase diagram can be found easily, these problems, 
in their clustered phase, are extremely hard from the algorithmic point of view: the best known 
algorithms all fail to find solutions. We thus propose new benchmarks of really hard optimization 
problems and provide insight into the origin of their typical hardness. 

PACS numbers: 89.70.Eg,75.10.Nr,64.70.P- 



Constraint satisfaction problems (CSPs) are one of the 
main building blocks of complex systems studied in com- 
puter science, information theory and statistical physics. 
Their wide range of applicability arises from their very 
general nature: given a set of N discrete variables subject 
to M constraints, the CSP consists in deciding whether 
there exists an assignment of variables which satisfies si- 
multaneously all the constraints. In computer science 
CSPs are at the core of computational complexity stud- 
ies: the satisfiability of boolean formulas is the canoni- 
cal example of an intrinsically hard, NP-complete, prob- 
lem [l[ . In information theory error correcting codes also 
rely on CSPs. The transmitted information is encoded 
into a codeword satisfying a set of constraints, so that 
information may be retrieved after transmission through 
a noisy channel, using the knowledge of the constraints. 
Many other practical problems in scheduling a collection 
of tasks or in hardware and software verification and test- 
ing are viewed as CSPs. In statistical physics the inter- 
est in CSPs stems from their close relation with theory 
of spin glasses. Answering if frustration is avoidable in 
a system is a first, and sometimes highly nontrivial, step 
in understanding its low temperature behavior. 

Methods of statistical physics provide powerful tools to 
study statistical properties of CSPs The mean field 

approach is known to be exact if the underlying graph of 
constraints [J] is either fully connected or locally tree- like. 
It also has algorithmic, and practical, consequences: in 
contrast with the usual situation in physics, CSPs on a 
locally tree-like graph are used in practice, for instance 
in low density parity check codes [5|, which are among 
the best error-correcting codes around. 

Many CSPs are NP-complete. Nevertheless, large 
classes of instances can be easy to solve. It is one of 
the main goals of theoretical computer science to under- 
stand why some instances are harder than others, where 
the hardness comes from and how to avoid it, beat it 
or use it. The random K-satisfiability (if- SAT) problem 
where clauses are chosen uniformly at random between all 
possible ones has played a prominent role in approach- 
ing this goal. In random A-SAT there exists a sharp 



satisfiability threshold. This is a phase transition point 
separating a 'SAT' phase with low density of constraints 
where instances are almost always satisfiable, from an 
'UNSAT' phase where, with high probability, there is no 
solution to the CSP (jl, 0] • The hardest instances lie near 
to this threshold [1, 0] . The main i nsig ht came from sta- 
tistical physics studies 
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allow to describe the structure of the space of solution 
of the random A-SAT problem. The most interesting re- 
sult is the existence of an intermediate "clustered" phase, 
just below the SAT-UNSAT threshold, where the space 
of solutions splits into well separated clusters. A ma- 
jor open question consists in understanding if and how 
the existence of clusters makes the problem harder. The 
survey propagation algorithm, which explicitly takes into 
account the clusters, is the best known solver very close 
to the SAT-UNSAT threshold [H, but some local search 
algorithms also perform well inside the clustered phase 
[rn . EH- Another proposition, put forward in is 
that solutions in clusters with frozen variables, taking 
the same value in the whole cluster, are hard to find. It 
was shown in [191 ] that, even if solutions belonging to clus- 
ters without frozen variables are exponentially rare, some 
message passing algorithms may be able to find them. 

In this letter we introduce and study a broad class 
of CSPs which are extremely frozen problems: all the 
clusters consists of a single configuration, thus all the 
variables are frozen in every cluster. We show that 
these problems are extremely difficult from an algorith- 
mic point of view: all the best known algorithms fail to 
solve them in this clustered phase. At the same time the 
description of their phase diagram can be carried out in 
details with relatively simple statistical physics methods. 

Definition - We define an occupation CSP over N bi- 
nary variables, s±, . . . ,8jf G {0,1} as follows: each con- 
straint a connects to A randomly chosen variables, and 
its status depends on the sum r of these variables. The 
constraint is characterized by a (A+l) component vector 
A = (AqAi ■ ■ ■ Ak), with A r g {0, 1}: it is satisfied if and 
only if A r = 1. We shall study here homogeneous models 
in which all constraints connect to the same number A 
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of variables, and are characterized by the same vector A. 
According to [2(| the occupation CSPs are NP-complete 
if K > 2, A = Ax = and A is not a parity check. The 
locked occupation problems (LOP) are occupation CSPs 
satisfying two conditions: (a) Vi = 0, . . . , K — 1 the prod- 
uct AiAi + i — 0, (b) all variables are present in at least 
two constraints. Simple examples of LOPs are positive 
l-in-3 satisfiability [5l|, A = 0100, or parity checks 
A = 01010, on graphs without leaves. In order to go 
from one solution (satisfying assignment) of a LOP to 
another one, it is necessary to flip at least a closed loop 
of variables in the factor graph representation of 0] • This 
stays at the root of the crucial property that clusters are 
point-like and separated by an extensive distance when 
the density of constraints is large enough (above Id)- 

In order to fully characterize a random LOP ensemble, 
one needs to define the degree distribution of variables. 
We will study here two ensembles. The regular ensemble, 
where every variable appears in exactly L constraints, 
and the truncated Poisson ensemble with degree distribu- 
tion Q(0) = Q(l) = 0,Q(l) = e- c c l /ll[l-(l + c)e-%l > 
2 and average connectivity I = c(l — e~ c )/[l — (1 + c)e~ c ]. 

Phase diagram - Denoting by a, b, . . . the indices of 
constraints and those of variables, the belief prop- 

agation (BP) equations 22j are given by: 
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where da are all the variables appearing in constraint a, 
and di all the constraints in which variable i appears. 
xij* a is the probability that spin j takes value Sj when 
a was removed from the graph, and Z are normalization 
constants. The BP entropy (the logarithm of number of 
configuration satisfying all constraints, divided by N) is 
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where: 
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In order to find a fixed point of eqs. |T][2|) and compute the 
quenched average of the entropy we use the population 
dynamics technique , with population sizes of order 10 4 
to 10 5 . It turns out that this procedure always converges 
to the same fixed point. 

The phase diagram of LOPs is much simpler to analyze 
than the one of general CSPs, and can be deduced purely 



from the BP analysis. This is due to the fact that, in the 
clustered phase, every cluster reduces to a single isolated 
configuration. The survey propagation (SP) equations 
[l3| are then greatly simplified. Their iteration either 
leads to a trivial fixed point, where every variable is in 
the so called "joker" state [23], or to a fixed point where 
no variable is in the "joker" state. In this second case 
the SP equations reduce to the BP eqs. (HE|), and the 
complexity function (logarithm of number of clusters) is 
equal to the entropy ([3]), in agreement with the point-like 
nature of clusters. The clustered phase is then identi- 
fied from the iterative stability of this second, non-trivial 
fixed point. It is iteratively stable when the average con- 
nectivity is above a threshold: I > Id, while the regime 
I < Id corresponds to a 'liquid' phase. The intuitive dif- 
ference between the two phases is that in the clustered 
phase one has to flip an extensive number of variables to 
go from one solution to another, while in the liquid phase 
the addition of any infinitesimal temperature is enough 
to be able to connect all solutions. 

The satisfiability threshold l s is defined as follows: If 
the average connectivity is I < l s then a satisfying as- 
signment almost surely exists (in N — > oo), and if I > l s 
then there is almost surely no satisfying configuration. 
In LOPs we can find l s as the average connectivity at 
which the RS entropy ([3]) becomes zero. Table. |T] gives 
the values of clustering and satisfiability thresholds for 
the non-trivial LOPs with K < 5. 



A 


name 


L s 


z d 


l s 


0100 


l-in-3 


3 


2.256(3) 


2.368(4) 


01000 


l-in-4 


3 


2.442(3) 


2.657(4) 


00100* 


2-in-4 


3 


2.513 


2.827 


01010* 


odd 4-PC 


4 


2.856 


4 


010000 


l-in-5 


3 


2.594(3) 


2.901(6) 


001000 


2-in-5 


4 


2.690(3) 


3.180(6) 


010100 


l-or-3-in-5 


5 


3.068(3) 


4.724(6) 


010010 


l-or-4-in-5 


4 


2.408(3) 


3.155(6) 



TABLE I: The clustering id and satisfiability l s thresholds in 
the locked occupation problems for K < 5 in the truncated 
Poisson ensemble. In the regular ensemble L a is the first un- 
satisfiable or critical connectivity, the first clustered case is 
L c \ = 3. The error bars originate in the statistical nature of 
the population dynamics technique. Symmetric LOPs where 
the satisfiability threshold can be computed analytically are 
indicated by *. 

When a LOP is symmetric, i.e., A r = Ak- t for all 
r = 0, . . . , K , and this 0—1 symmetry is not sponta- 
neously broken, the satisfiability threshold can be com- 
puted rigorously using the 1st and the 2nd moment meth- 
ods: The annealed entropy (Z) = exp (Ns min ) is: 



Sann(0 = log 2 + \- log 
IS. 
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By computing the second moment (Z 2 ) and using the 
Chebyshev's inequality, as in 2J,|25j, we have shown that 
the annealed entropy is equal to the typical one, thus the 
satisfiability threshold l s is given by s ann (l s ) = 0. Exam- 
ples of LOPs for which this works are the parity checks 
A = 01010, as well as A = 00100,0001000,0010100, etc. 
Note that for instance A = 010010 does not belong to this 
class because its 0,1 symmetry is spontaneously broken. 

Algorithms - We attempt to find solutions to LOPs 
in their satisfiable phase using three algorithms which 
are among the best for hard random instances of the 
K-satisfiability problem: belief propagation decimation 
(BPd) [14] (which is the same as survey propagation [l3| 
in LOPs), stochastic local search (SLS) [26j, and rein- 
forced belief propagation (rBP) [27j |. 

In BPd one uses the knowledge of marginal variable 
probabilities from BP equations in order to identify the 
most biased variable, fix it to its most probable value, 
and reduce the problem. In A"-SAT the SP decimation 
(which in LOPs is equivalent to BPd) has been shown to 
be very efficient, on very large problems, even very near 
to the satisfiability threshold [13j. However, in LOPs 
the BP decimation fails badly. For example in the 1-or- 
3-in-5 SAT problem, on truncated Poisson graphs with 
M = 2-10 4 constraints, the probability of success is about 
25% at I = 2, and less than 5% at already I = 2.3, way 
below the clustering threshold Id — 3.07. 

Although we do not know how to analyze directly the 
BPd process, some mechanisms explaining the failure of 
the decimation strategy can be understood using the ap- 
proach of [28(. The idea is to analyze a slightly simpler 
decimation process, where the variable to be fixed is cho- 
sen uniformly at random and its value is chosen accord- 
ing to its exact marginal probability, which is assumed 
to be approximated by BP. The reduced formula after 
ON steps is equivalent to the reduced formula created 
by choosing a solution uniformly at random and reveal- 
ing a fraction of its variables. The number of vari- 
ables which were either revealed or are directly implied 
by the revealed ones is denoted $(0). The performance 
of this 'uniform' BP decimation can be understood from 
the shape of the function &(0), which we have computed 
from the cavity method. 

In Fig.Q]we show that the theoretical curve & (0) agrees 
with numerical results in regular l-or-3-in-5 SAT. At 
connectivity L = 3 the function has a discontinuity at 
S ~ 0.46, thus after fixing a fraction 6 S of variables an 
infinite avalanche of direct implications follows and small 
errors in the BP estimation of marginals lead to a con- 
tradiction with high probability. At connectivity L = 2 
the function <!>(#) — ► 1 at 0\ ~ 0.73. This means that if 
a fraction 6 > 0\ of variables in a random solution is re- 
vealed the residual problem has only this single solution. 
Any mistake in the previously fixed variables matters and 
causes a contradiction. In all the LOPs we have studied, 
<&(6) has one these two fatal properties. The inset of 
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FIG. 1: Uniform BP decimation in regular l-or-3-in-5 SAT 
with L — 2 and L — 3: plot of $(6), as obtained analytically 
(lines) and from the uniform BP decimation (points): the two 
plots agree perfectly. For L = 3 the decimation fails because 
of avalanches at the discontinuity of $(#), for L = 2 it fails 
when $(#) — > 1 for 6 < 1. Inset: Comparison between BPd 
and uniform BP decimation. The number of directly implied 
variables is plotted against number of variables which were 
free just before fixing them. The two methods are very close, 
and they fail at about the same value of 9. 



Fig. [U shows that, in LOPs, there is not much difference 
in the behaviors of BPd and this uniform BP decimation. 

Stochastic local search (SLS) algorithms exist in many 
different versions and are used in most practical cases 
where the exhaustive search is too time consuming. The 
main idea of the family of algorithms is to perform a ran- 
dom walk in configurational space, trying to minimize the 
the number of unsatisfied constraints. In the implemen- 
tation of 17J], a variable which belongs to at least one 
unsatisfied constraint is chosen randomly. If flipping this 
variable does not increase the energy, the flip is accepted. 
If it increases the energy, the flip is accepted with prob- 
ability p. This is repeated until either one finds a solu- 
tion, or the number of steps per variable exceeds T. The 
parameter p must be optimized. In Fig. [5] we plot the 
fraction of successful runs for the l-or-3-in-5 SAT with 
M constraints and p — 0.00003. Even with the largest 
value of T we have not been able to solve instances with 
average connectivity larger than 3.05. 

The belief propagation reinforcement (rBP) was orig- 
inally introduced in 27j. The main idea is to add an 
external field /i* . which biases the variable i in the di- 
rection of the marginal probability computed from the 
BP messages. This modifies BP eq. ^ to ip l s ~* a — 
Ms, Tlbedi-a ^ h s^ % I Z' l ^ a . The algorithm then works as 
follows: Iterate the BP equations rt-times. Update all 



7r'*,/4 = (l-7r) < S 

otherwise set fi\ = (1 — 7r)'* , = n li , where Q. — 



the external fields: If£f < £g set \x\ 
rwisi 



At each iteration one checks 



if the most probable configuration, given by Sj = if 
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FIG. 2: Performance of reinforced BP and stochastic local 
search for l-or-3-in-5 SAT with M constraints. The fraction 
of successful runs is plotted against the average connectivity I. 
The clustering threshold id is marked, and the satisfiability 
transition is at l s — 4.72. The maximal numbers of steps per 
variable T are chosen such that the running times of rBP and 
SLS are comparable. 

/1q > fj,\ and Si = 1 otherwise, is a solution. If it is not 
one iterates at maximum T times. We chose n = 2 and 
optimized the value of tt. In Fig. [5] we plot the fraction 
of successful runs for the l-or-3-in-5 SAT with tt = 0.42 
for 2.8 < I < 3 and tt = 0.43 for 3 < I < 3.2. The per- 
formance is marginally better than SLS, but again one 
cannot penetrate into the clustered phase. 

We have observed the same behavior for all LOPs we 
studied: the clustering transition point Id seems to be a 
boundary beyond which all these three algorithms fail. 
As shown in Table QJ this point can be very far from the 
SAT-UNSAT transition l s , meaning that there is a broad 
range of instances where known algorithms are totally in- 
efficient. The parity check problems are the exception as 
they can be solved with linear programming algorithms. 

Conclusions - LOPs make a broad class of extremely 
hard constraint satisfaction problems. Their phase dia- 
gram is simple: the set of satisfiable configurations be- 
comes clustered when the average connectivity is I > Id, 
and it disappears for I > l s . These two thresholds can be 
computed efficiently using population dynamics, and in 
the case of some symmetric problems the value of l s can 
be confirmed rigorously. At the same time, the best al- 
gorithms known for random CSP fail to find solutions in 
the clustered phase Id <l < l s . This difficulty is due to 
the 'locked' nature of the problem which reduces the clus- 
ters to single points. It will be interesting to investigate 
if LOPs might be used to design new efficient nonlinear 
error correcting codes, or if the planted LOPs are good 
candidates for one-way functions in cryptography. 
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